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Photosynthesis is one of the most important biological processes on the earth. So 
far, though the molecular mechanisms underlying photosynthesis is well understood, 
however, the regulatory networks of photosynthesis are poorly studied. Given the 
current interest in improving photosynthetic efficiency for greater crop yield, elucidating 
the detailed regulatory networks controlling the construction and maintenance of 
photosynthetic machinery is not only scientifically significant but also holding great 
potential in agricultural application. In this study, we first identified transcription factors 
(TFs) related to photosynthesis through the TRAP approach using position weight matrix 
information. Then, for TFs related to photosynthesis, interactions between them and their 
targets were also determined by the ARACNE approach. Finally, a gene regulatory network 
was established by combining TF-targets information generated by these two approaches. 
Topological analysis of the regulatory network suggested that (a) the regulatory network 
of photosynthesis has a property of "small world"; (b) there is substantial coordination 
mediated by transcription factors between different components in photosynthesis. 



Keywords: photosynthesis, transcription factors, regulatory network, small world, coordination 



INTRODUCTION 

In recent years, more and more research shows that improving 
photosynthetic efficiency is a major viable approach to further 
increase crop productivity for enhanced food and fuel production 
(Zhu et al., 2010). In this aspect, much research has been devoted 
to study the molecular mechanism of photosynthesis and identify 
potential options to further optimize the photosynthetic machin- 
ery (Zhu et al., 2010). A number of engineering targets have 
indeed been identified, such as increasing expression of SBPase 
(Lefebvre et al., 2005) and manipulation of recovery speed from 
photo-protected state (Zhu et al, 2004) etc. Great efforts are now 
undertaken to engineer these targets in different crops to improve 
photosynthetic efficiency. 

In most of the current research on improving photosynthesis, 
individual components that can potentially increase photo- 
synthesis were identified and used as targets for engineering. 
This approach has generated certain success, as in the case 
of SBPase where its over-expression increased photosynthesis 
and biomass production (Lefebvre et al., 2005; Zhu et al., 
2007). However, this approach usually does not consider the 
inherent interaction between different components of photo- 
synthetic machinery. Much evidence however suggested that 
there is substantial interaction among different components of 
photosynthesis. As a result, altering the expression level of one 
gene might generate changes in the expression level of many other 



photosynthetic genes simultaneously. For example, decrease of 
the expression level of Rubisco small subunit led to changes in 
phosphoribulokinase activity in Calvin-Benson cycle in tobacco 
(Hudson et al., 1992). Knock down of Rieske FeS led to decrease 
of the concentrations of the cytochrome bgf complex and Rubisco 
in tobacco (Price et al., 1998). In C4 plants, mutation of Zmhcfl36 
caused loss of PSII complexes and grana thylakoid in mesophyll 
cells and simultaneous changes in expression patterns in the bun- 
dle sheath cells and mesophyll cells, including the differential 
levels of several C4 genes (e.g., PEPC, CA etc) (Covshoff et al., 
2008). Over expression of C4 PEPC in rice also changed the 
expression level of Rubisco and FBPase (Agarie et al., 2002). 

The close interaction among photosynthetic components 
is also reflected in the coordinated expressions of different 
components of photosynthetic machinery. In most C3 plants, 
surveys of photosynthetic physiological parameters showed that 
the maximal rate of Rubisco-limited photosynthesis (V cmax ) and 
the maximal rate of RuBP-limited photosynthesis (J max) strongly 
correlated with each other (Wullschleger, 1993) suggesting that 
expression of the genes underlying these parameters should be 
highly coordinated. In maize, a typical C4 plant, the bundle 
sheath and mesophyll cells have distinct patterns of protein accu- 
mulation, e.g., compared to mesophyll thylakoids, the thylakoid 
in the bundle sheath cells have a 55% reduced PSII content, 
unchanged ATP Synthase content, and a 65% increased PSI and 
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33% increased cytb6f contents (Majeran et al, 2008). These stable 
coordinated expression of different components of photosynthe- 
sis in C4 plants also suggested that they were regulated by delicate 
regulatory networks. 

The close coordination of expression of photosynthetic 
components has great physiological significance, in particular, it 
is critical for maintaining a high resource use efficiency of photo- 
synthesis under different conditions. This has been demonstrated 
in a number of cases. For example, under elevated CO2 
conditions, the expression level of soluble rbcs protein is gradu- 
ally decreased in rice (Chen et al, 2005), which is consistent with 
the required adjustment for enhanced photosynthetic light and 
nitrogen use efficiency (Long et al., 2004). Therefore, understand- 
ing the molecular mechanisms underlying these coordinated 
changes in expression of different components in photosynthesis 
under different conditions will not only be of scientific signifi- 
cance but also have great agricultural application potentials. 

Unfortunately, so far, only a limited number of transcription 
factors related to regulation of photosynthetic genes have been 
discovered, mainly through traditional forward genetic studies 
(Saibo et al., 2009). The research on construction of genetic reg- 
ulatory network of photosynthesis using high throughput tran- 
scriptomic and genomic data are mostly lacking. This is in sharp 
contrast to the rapid progresses in construction of regulatory 
networks related to other plant processes, e.g., the circadian clock 
and flowering control (Keurentjes et al., 2007; Ma et al., 2007; 
Thai et al, 2007; Long et al, 2008; Lee et al., 2010). 

In this study, we aim to construct genetic regulatory net- 
work using transcriptomic and genomic data. Specifically, we 
combined two existing bioinformatics algorithms, the TRAP 
(Roider et al, 2007) and the ARACNE (Basso et al, 2005; 
Margolin et al., 2006) approaches to construct gene regulatory 
network. We used Arabidopsis thaliana as the model species 
because its genomic sequences are available and there are large 
amount of transcriptomic data. The TRAP algorithm was devel- 
oped to predict downstream target genes of TFs through calcu- 
lating binding affinity between transcription factors and DNA 
fragments, which has been shown to have high accuracy in previ- 
ous studies (Roider et al., 2007). ARACNE is a software developed 
to find assocition relationship between genes based on mutual 
information using expression data (Basso et al., 2005; Margolin 
et al, 2006). This study identified a number of new candidate 
transcription factors as regulators of photosynthesis, together 
with some TFs reported earlier. Though the resulting genetic reg- 
ulatory network is still small-scale, the network already showed 
the "small world" property (Braha and Bar- Yam, 2004). We also 
found evidences suggesting that TFs played a crucial role in coor- 
dinating expression of different components in photosynthesis. 

MATERIALS AND METHODS 
WORKFLOW OF THE WHOLE PROJECT 

The workflow of our study was showed in Figure 1A. We 
first applied the TRAP algorithm to calculate the binding 
affinities between TF and DNA fragments in promoter regions 
of Arabidopsis genes. Then TFs involved in photosynthesis were 
identified through a modified Fisher's test. After that, we further 
identified the interaction between these TFs and their target 
gene using the ARACNE algorithm based on transcriptomics 



data. Finally, TF-target pairs identified by both algorithms were 
selected as edges of the gene regulatory network of photosynthesis 
in Arabidopsis. Here we describe in detail the major algorithms 
involved in this workflow. 

Collecting and grouping photosynthetic genes 

We first collected pathways (map00195 and map00710) associated 
with photosynthesis for Arabidopsis from the KEGG database 
(http://www.genome.jp/kegg/). The genes contained in these 
pathways were used as photosynthetic genes. These genes include 
all enzymes involved in the Calvin-Benson cycle, ATP synthesis, 
components involved in electron transfer, the light reactions and 
genes related to C4 photosynthesis. Though Arabidopsis did not 
operate C4 photosynthesis, all C4 photosynthesis related genes 
exist in Arabidopsis. Altogether, 124 photosynthesis-related genes 
or isoforms were used in our study. These genes were categorized 
into following groups according to their biological functions, 
i.e., the Calvin Cycle (CC), Photosystem II (PSII), Photosystem I 
(PSI), Light Harvest Complex (LHC), Photosynthesis Electronic 
Transport (PET), Cytb6/f, F-type ATPase (FTA), and C4 related 
genes (C4). 

Predicting interactions between TFs and candidate genes using the 
TRAP algorithm 

In this study, we defined the genomic region from upstream 
1000 bp to downstream 500 bp from the transcription start site 
(TSS) of a gene as the promoter region. The promoter region 
sequences of all Arabidopsis genes were downloaded from the 
Phytozome database (http://www.phytozome.net). Then we col- 
lected all plant transcription factors and their corresponding 
position weight matrices (PWMs) from the TRANSFAC database 
(Matys et al., 2003). As a result, 124 TFs and their correspond- 
ing PWMs were obtained for further study. The promoter region 
sequences and PWMs were used as input of the Transcription 
Factor Affinity Prediction (TRAP) algorithm to predict interac- 
tions between TFs and candidate genes. 

Given a PWM (with length W) of certain TF and a promoter 
sequence (with length L) of a potential targeting gene, the binding 
affinity score (N) is computed as the sum of contributions from 
all possible sites (1) in the promoter sequence through followed 
equation: 



L-W L-W 



(1) 



Where, pi is the binding probability of site /, and Rq is defined by 
followed equation: 



Ro = K(So) ■ [TF] 



(2) 



Here, K(So) is an equilibrium constant. In Equation (2), the 
energy E is set to be zero, and [TF] denotes the concentration 
of the transcription factor (Roider et al, 2007). Given a tran- 
scription factor, /?£/(X) describes its binding energies where the 
parameter X is used to scale the mismatch energies in units of 
thermal energy (Roider et al., 2007). 

Based on Equation (2), Rq is a constant for a TF at a given con- 
centration. Therefore, we only need to consider the influence of 
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FIGURE 1 | Schema of the whole study. (A) The workflow of the whole project. (B) The schema of GSEA analysis for a particular TF. 



parameter X. on binding affinity score. In practice, we used a series 
values of X (i.e., 0.5, 0.6, 0.65, 0.7, 0.75, 0.8, 0.9, with 0.7 being the 
default value) to calculate the score. Given a certain value of X, its 
impacts on the score was evaluated through counting the num- 
ber of overlapped genes in top 1000 candidates compared with 
prediction result with a X of 0.7. 



Identifying transcription factors involved in photosynthesis 

We first tested the impacts of modifying X on prediction of TF 
and their targets. We found that varying X did not influence the 
top 1000 genes much. Therefore, in this study, we used a X value of 
0.7. With this, we calculated binding affinities between these 124 
TFs and their target genes with TRAP. Then, we used a modified 
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Fisher's test to identify TFs whose target genes were significantly 
enriched in photosynthesis: (1) during calculation of the bind- 
ing affinity scores, we set a series of values spanning from 0.5 to 
5.0 with a step size of 0.1 as score cutoff. When a gene's bind- 
ing affinity score was higher than the cutoff, it was regarded as 
a potential target of a certain TF. (2) A modified Fisher's test was 
conducted to calculate the p-value and judge whether targets were 
enriched in photosynthesis gene set among the background gene 
set. For each TF, a series of p-values were calculated based on dif- 
ferent cutoffs values, and then the minimal p-value was selected. 
When the p-value was less than 0.001, the TF was regarded as a 
TF targeting photosynthesis genes. 

Identifying TF-target pairs from microarray data 

Microarray data of mature leaves in Arabidopsis (generated by the 
Affymetrix GPL 198 platform) were downloaded from the GEO 
database (http://www.ncbi.nlm.nih.gov/geo/). Altogether 391 
experiments with 5626 chips were obtained (Data Sheet 3). The 
ARACNE algorithm, which uses mutual information between TF 
and their target genes, requires differential expression for involved 
genes (Basso et al., 2005; Margolin et al, 2006). We therefore 
selected experiments in which more than half of the interested 
TFs have a relative high expression diversity, i.e., for which the 
coefficient of variation in the expression values is higher than 0.1. 
After this filtering step, we obtained 23 experiments with a total of 
454 chips. These chips were normalized across experiments using 
"normalizeBetweenArrays" implemented in R package "limma" 
(Smyth, 2005). In this study, some TFs' PWMs were not derived 
from Arabidopsis. For these TFs, we identified their best orthologs 
in Arabidopsis through querying their sequences against the 
Arabidopsis protein database with the BLASTP program. After 
that, we used the ARACNE algorithm, which was developed 
to identify correlated gene pairs based on mutual information 
(Basso et al, 2005; Margolin et al, 2006), to identify TF-target 
pairs with a threshold p-value of 10~ 7 . After that, for each TF, its 
target genes were put into a list, where genes were ranked in a 
descending order based on the value of mutual information. 

Coherence analysis of two methods for photosynthetic genes 

For a given TF, a ranked targeted gene list based on mutual 
information was constructed by the ARACNE algorithm. Then 
the software GSEA (Mootha et al, 2003; Subramanian et al., 
2005) was utilized to detect whether the TF's target genes 
identified by the TRAP algorithm are enriched in this ranked 
gene list (Figure IB). 

Network properties of the regulatory network 

The final genetic regulatory network was constructed by integrat- 
ing the TF-target relationship predicted with both methods. We 
calculated a number of network properties for this network: (1) 
the degree of a node, which is the number of edges connected to 
node; (2) the diameter of a network, which is the longest value of 
all the shortest paths in the network; (3) clustering coefficient of 
a node, which is the ratio of existing edges linking a node's neigh- 
bors to each other to the maximum possible number of such edges 
among the neighbors of a node. 



RESULTS AND DISCUSSION 

A FEASIBLE STRATEGY FOR IDENTIFICATION OF TFS RELATED TO 
PHOTOSYNTHESIS 

In this study, we first tested whether prediction results of the 
TRAP algorithm is sensitive to parameter X. As shown in data 
sheet 1 (see Supplemental Data), for most of the transcription 
factors, more than 90% of the top 1000 genes generated by differ- 
ent X value were identical. Results of ten selected TFs are showed 
in Figure 2. These results suggest that predictions from the TRAP 
algorithm is not sensitive to the parameter X. Hence, in this study, 
we use the default value (0.7) of X to calculate the binding affinity 
between TFs and candidate genes in Arabidopsis. Subsequently, a 
modified Fisher's test is utilized to identify TFs related to photo- 
synthesis (p-value < 0.001). In total, we identified 13 TFs that 
were related to photosynthesis (Table 1), among which 8 TFs 
have been reported earlier to have close relation to the regula- 
tion of photosynthesis (Table 1). Additionally, RITA-1, although 
it was not reported as a regulator of photosynthesis, its ortholog 
in Arabidopsis, bZIP9, can form a heterodimer with bZIP25 
(Opaque-2) and bZIP63 (CPRF-2), which play an important role 
in regulation of light-induced genes (annotated in UniproKB) 
(http://www.uniprot.org). So it is possible that RITA-1 is also 
an important regulator of photosynthesis. Therefore, the strategy 
using in our study to recognize TFs involved in photosynthesis 
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FIGURE 2 | Comparison results of different values of parameter I for 
the TRAP algorithm. We used a number of values of X, i.e., 0.5, 0.6, 0.65, 
0.75, 0.8, 0.9, to calculate the binding affinity between TFs and all the 
genes in Arabidopsis genome using TRAP Then for each TF, we chose the 
top 1000 target genes with high binding affinity. We further calculated the 
overlap between this gene set and the top_1000_genes identified under 
X = 0.7 (the default value). The horizontal axis denotes the used values of 
X, while vertical axis denotes the ratio of overlap between these two gene 
sets. The result shows that TRAP is not sensitive to the parameter value of 
X. Here the overlaps for only two TFs' results are showed (see the 
Supplemental Table 1 for all results). M00503, ATHB-5, from Arabidopsis 
thaliana; M00356, bZIP910, bZIP transcription factor from Antirrhinum 
majus. 
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Table 1 | Summary of TFs related to photosynthesis. 



TF 


Species 


PWM 


Function 


Homolog in Arabidopsis 


Opaque-2 


Zea mays 


NCCACGTVRN 


Activation of cyPPDKI; additive increase in 
combination with PBF 


BZIP25(AT3G54620) 


CPRM 


Petroselinum crispum 


KMCACGTGKM 


Regulate light-induced genes 


GBF3(AT2G46270) 


CPRF-3 


Petroselinum crispum 


NHSACGTSDN 


Regulate light-induced genes 


GBFKAT4G36730) 


RITA-1 


Oryza sativa 


YSACGTR 


The rice bZIP transcriptional activator RITA-1 is 
highly expressed during seed development 


BZIP9(AT5G24800) 


EmBP-1 


Zea mays 


GCCACGTGAN 


Can activate transcription from a truncated 
promoter containing a pentamer of the 02 site in 
yeast cells; 


GBFKAT4G36730) 


CPRF-2 


Petroselinum crispum 


NHCACGTGDN 


Regulate light-induced genes 


BZIP63(AT5G28770) 


TGA1 b 


Nicotiana tabacum 


DHSACGTSDB 


Binds specifically to the DNA sequence 
5'-TGACG-3' 


AT2G40950 


HBP-1a 


Triticum aestivum 


GNCACGTGGC 


Binds to the hexamer motif 5'-ACGTCA-3' of 
histone gene promoters 


BZIP16(AT2G35530) 


TAF-1 


Nicotiana tabacum 


GCCACGTGGC 


Binds to a G-box-related element, 
(5'-GCAACGTGGC-3'). Also binds to the HEX-motif 
of wheat histone H3 promoter 


GBF3(AT2G46270) 


PIF1 


Arabidopsis thaliana 


GNCACGTGRN 


Regulates negatively chlorophyll biosynthesis and 
seed germination in the dark, and light-induced 
degradation of PIF1 relieves this negative 
regulation to promote photomorphogenesis 


AT2G20180 


GBF1 


Arabidopsis thaliana 


TKCCACGTGGCM 


Binds to G-box motif (5'-CCACGTGG-3') of rbcS-1A 
gene promoter. Regulate light-induced genes 


AT4G36730 


GAMYB 


Hordeum Vulgare 


NNSCRRYAACNVA 


Transcriptional activator of gibberellin-dependent 
alpha-amylase expression in aleurone cells 


MYB33(AT5G06100) 


ARR10 


Arabidopsis thaliana 


AGATHYK 


Functions as a response regulator involved in 
His-to-Asp phosphorelay signal transduction 
system 


AT4G31920 



The annotation information of transcription factors are collected from the TRANSFAC and the UniProt database. 



is feasible, since 9 out of 13 TFs (around 70%) are verified as Table 2 | Results of coherence analysis for TF targets identified by the 
important regulators of photosynthesis. TRAP and ARACNE algorithm. 

TFs Homologs TF targets (setl) 



COHERENCE ANALYSIS OF PHOTOSYNTHETIC GENE SETS 






Num 


NOM p-val 


After identifying TFs related to photosynthesis by the TRAP 








algorithm, homologies of these TFs in Arabidopsis were collected 


PIF1 


AT2G20180 


26 


0.23673469 


through the BLASTP program in NCBI (www.ncbi.nlm.nih.gov). 


HBP-1a 


AT2G35530 


12 


0.82678 


As a result, 10 TFs in Arabidopsis and their corresponding targets 


TGA1 b 


AT2G40950 


23 


<0.001 


(setl) were obtained. For each of these 10 TFs, we used the 


CPRF-KTAF1) 


AT2G46270 


30 


<0.001 


ARACNE software (Basso et al, 2005; Margolin et al, 2006) 


Opaque-2 


AT3G54620 


28 


0.002079002 


to calculate mutual information between the TF and all genes 


ARR10 


AT4G31920 


5 


0.08054523 


in Arabidopsis. Then genes associated with the TF were ranked 


CPRF-3,EmBP-1,GBF 


AT4G36730 


43 


<0.001 


based on the mutual information score. After that, for each TF, 


GAMYB 


AT5G06100 


67 


0.008008008 


a coherence analysis is conducted between targets identified by 


RITA-1 


AT5G24800 


28 


0.15663901 


the TRAP and the ranking generated by the ARACNE software 


CPRF-2 


AT5G28770 


53 


0.069 


(Table 2). As showed in the Table 2, genes of setl are significantly 
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FIGURE 3 | Gene regulatory network of photosynthesis in 
Arabidopsis. (A) Nodes in different colors represent genes involved in 
different sections of photosynthesis. We have divided the 
photosynthesis system into the following sections: ATPase, enzymes 
related to C4 photosynthesis, Calvin cycle, light harvesting complex. 



photosynthetic electron transfer, photosynthesis I, and photosystem II. 
(B) Transcription factor-components network. Photosynthesis genes are 
assigned to components. And a transcription factor and a component 
are linked if the transcription factor linked to one of the genes in the 
component in network (A). 
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enriched on top of the rankings for almost TFs except HBP- la. 
These results suggested that the TFs regulating photosynthetic 
gene sets identified by TRAP and ARACNE were consistent with 
each other. 

IDENTIFIED TFs RELATED TO PHOTOSYNTHESIS REGULATION 

Table 1 lists the TFs identified by the TRAP algorithm. Among 
these TFs, many belong to the family of G-box binding protein 
[22]. G-box is a ris-acting element, 5. . . CACGTG. . . 3, identi- 
fied in promoters of many plant genes (Williams et al., 1992). 
G-box binding proteins is a big gene family, while CPRF-1, 
CPRF-2, CPRF-3, EmBPl, TAF1, Opaque-2, GBF1 are mem- 
bers of the family (Siberil et al, 2001) (Table 1). CPRF-3 and 
EmBPl are homologs of Arabidopsis GBF1 while CPRF-1 and 
TAF-1 are homologs of Arabidopsis GBF3 (Table 1). GBF1 exists 
in the nuclei of tomato and Arabidopsis and can interact with 
the G-box motif in promoters of many rbcs isoforms; while 
GBF3 shares the same binding motif with GBF1 (Giuliano 
et al., 1988). Earlier reports have shown that GBF1 and HY5 
form DNA-binding heterodimer at rbcs- la promoter; but dif- 
ferent from HY5, GBF1 is a negative regulator of rbcs-la [24]. 
GBF1-HY5 heterodimer is also a positive regulator of CAB1, 
which promotes accumulation of chlorophyll in cells and plays 
an important role in blue-light-induced photo-morphogenesis 
(Singh et al., 2012). Previous study also suggested that in 
Arabidopsis the Pro-rich activation domain of GBF1 can interact 
with GLK2 and GLK1, which regulates chloroplast development 
in diverse plant species (Tamai et al., 2002). These evidences 
suggested that GBF1 is an important TF regulating photosyn- 
thesis. Another identified G-box binding protein CPRF-2 can 
form heterodimer with CPRF1 or CPRF3 (Armstrong et al., 
1992). Therefore, these 6 G-box TFs are potentially important 
regulators of photosynthesis. Opaque-2 (02), another G-box 
binding factor, is also identified as an important regulator con- 
trolling expression of photosynthetic genes (Table 1). Opaque-2 
has been reported to control the expression of a cytosolic form 
of pyruvate orthophosphate dikinase-1 (cyPPDKl) (Maddaloni 
et al., 1996), a key enzyme in the C4 photosynthesis. In the 
o2 mutant maize, expression of multiple photosynthetic genes 
were are down-regulated, included PEPC and ME (Hartings 
et al., 2011). By the way, EmBP-1 can bind to the same site 
as Opaque-2 in the same promoters, but inhibit the regu- 
lated transcription of these promoters of Opaque-2. (Carlini 
et al., 1999). We also identified PIF1, which is a negative 
regulator of chlorophyll biosynthesis and seed germination in 
darkness, light-induced degradation of PIF1 can relieve this neg- 
ative regulation and promote photomorphogenesis (Moon et al., 
2008). 

PROPERTIES OF THE GENE REGULATORY NETWORK OF 
PHOTOSYNTHESIS 

A gene regulatory network of photosynthesis was established 
by combining TF-target pairs identified by both the TRAP and 
the ARACNE algorithm (Figure 3A). Then topology analysis was 
conducted by the "NetworkAnalyzer" module (Assenov et al., 
2008) of the cytoscape software (http://www.cytoscape.org/). 
Topology analysis of the network showed that it has a diameter 



of 6; the average number of neighboring nodes and average 
length of shortest path are 4 and 3 respectively (Figures 4A,B)- 
In addition, distribution of node degree in the network follows 
a power-law distribution (Figure 4C). These results indicate that 
the photosynthetic network is a scale free network, i.e., most 
of the components in the photosynthesis are regulated by rela- 
tively small number of regulators, i.e., TF in this case. These small 
number of regulators are regarded as hubs of the photosynthetic 
regulatory network and may play crucial role in coordination of 
expression of genes involved in photosynthesis. In the regulatory 
network that we obtained, the components of PSI, PET and PSII 
parts are co-regulated by the TFs of TGAlb, PIF1, Opaque-2, 
and CPRF-2 (Figure 3B). Furthermore, genes from Calvin cycle 
and F-Type ATP synthase are also co-regulated TF GAMYB and 
CPRF-2 (Figure 3B). Detailed reverse genetics studies should be 
conducted to study these coordination and their physiological 
significance. 
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FIGURE 4 | Topology property of the photosynthetic network. 

(A) Distribution of neighbors for nodes; (B) Distribution of shortest path 
length for the network; (C) Plot of number of nodes VS. nodes' degree. 
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CONCLUSIONS 

In this study, a genetic regulatory network of photosynthesis in 
Arabidopsis was constructed through combining the genomic 
sequence, TF binding information and gene expression data. 
We identified a number of novel transcription factors related 
to photosynthesis. The identified network follows a scale-free 
property. The potential hubs in the network coordinating various 
components of photosynthesis were also identified. Transgenic 
experiments are undergoing now to test the consequence 
of manipulating expression of these TFs on photosynthetic 
performances. 
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